AITopics | program comprehension

Collaborating Authors

program comprehension

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AI-Guided Exploration of Large-Scale Codebases

Alebachew, Yoseph Berhanu

arXiv.org Artificial IntelligenceAug-11-2025

Understanding large-scale, complex software systems is a major challenge for developers, who spend a significant portion of their time on program comprehension. Traditional tools such as static visualizations and reverse engineering techniques provide structural insights but often lack interactivity, adaptability, and integration with contextual information. Recent advancements in large language models (LLMs) offer new opportunities to enhance code exploration workflows, yet their lack of grounding and integration with structured views limits their effectiveness. This work introduces a hybrid approach that integrates deterministic reverse engineering with LLM-guided, intent-aware visual exploration. The proposed system combines UML-based visualization, dynamic user interfaces, historical context, and collaborative features into an adaptive tool for code comprehension. By interpreting user queries and interaction patterns, the LLM helps developers navigate and understand complex codebases more effectively. A prototype implementation for Java demonstrates the feasibility of this approach. Future work includes empirical evaluation, scaling to polyglot systems, and exploring GUI-driven LLM interaction models. This research lays the groundwork for intelligent, interactive environments that align with developer cognition and collaborative workflows.

comprehension, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.05799

Country: North America > United States > Virginia (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SCALAR: A Part-of-speech Tagger for Identifiers

Newman, Christian D., Scholten, Brandon, Testa, Sophia, Behler, Joshua A. C., Banabilah, Syreen, Collard, Michael L., Decker, Michael J., Mkaouer, Mohamed Wiem, Zampieri, Marcos, AlOmar, Eman Abdullah, Alsuhaibani, Reem, Peruma, Anthony, Maletic, Jonathan I.

arXiv.org Artificial IntelligenceApr-25-2025

--The paper presents the Source Code Analysis and Lexical Annotation Runtime (SCALAR), a tool specialized for mapping (annotating) source code identifier names to their corresponding part-of-speech tag sequence (grammar pattern). SCALAR's internal model is trained using scikit-learn's GradientBoostingClassifier in conjunction with a manually-curated oracle of identifier names and their grammar patterns. This specializes the tagger to recognize the unique structure of the natural language used by developers to create all types of identifiers (e.g., function names, variable names etc.). SCALAR's output is compared with a previous version of the tagger, as well as a modern off-the-shelf part-of-speech tagger to show how it improves upon other taggers' output for annotating identifiers. The code is available on Github 1 Index T erms --Program comprehension, identifier naming, part-of-speech tagging, natural language processing, software maintenance, software evolution I. I NTRODUCTION The identifiers developers create represent a significant amount of the information other developers must use to understand related code. Given that identifiers represent, on average, 70% of the characters in a code base [1], and developers spend more time reading code than writing [2], [3], it is important for researchers to better understand of how identifiers convey information, and how they can be improved to increase developer reading efficiency.

artificial intelligence, identifier, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.17038

Country:

North America > United States > Michigan > Genesee County > Flint (0.14)
North America > United States > Ohio > Wood County > Bowling Green (0.04)
North America > United States > Ohio > Summit County > Green (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)

Add feedback

KEN: Kernel Extensions using Natural Language

Zheng, Yusheng, Yang, Yiwei, Chen, Maolin, Quinn, Andrew

arXiv.org Artificial IntelligenceDec-9-2023

The ability to modify and extend an operating system is an important feature for improving a system's security, reliability, and performance. The extended Berkeley Packet Filters (eBPF) ecosystem has emerged as the standard mechanism for extending the Linux kernel and has recently been ported to Windows. eBPF programs inject new logic into the kernel that the system will execute before or after existing logic. While the eBPF ecosystem provides a flexible mechanism for kernel extension, it is difficult for developers to write eBPF programs today. An eBPF developer must have deep knowledge of the internals of the operating system to determine where to place logic and cope with programming limitations on the control flow and data accesses of their eBPF program enforced by the eBPF verifier. This paper presents KEN, an alternative framework that alleviates the difficulty of writing an eBPF program by allowing Kernel Extensions to be written in Natural language. KEN uses recent advances in large language models (LLMs) to synthesize an eBPF program given a user's English language prompt. To ensure that LLM's output is semantically equivalent to the user's prompt, KEN employs a combination of LLM-empowered program comprehension, symbolic execution, and a series of feedback loops. KEN's key novelty is the combination of these techniques. In particular, the system uses symbolic execution in a novel structure that allows it to combine the results of program synthesis and program comprehension and build on the recent success that LLMs have shown for each of these tasks individually. To evaluate KEN, we developed a new corpus of natural language prompts for eBPF programs. We show that KEN produces correct eBPF programs on 80% which is an improvement of a factor of 2.67 compared to an LLM-empowered program synthesis baseline.

ebpf program, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2312.05531

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.05)
Asia > China > Shanghai > Shanghai (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Enhancing Programming eTextbooks with ChatGPT Generated Counterfactual-Thinking-Inspired Questions

Narayanan, Arun Balajiee Lekshmi, Hendrawan, Rully Agus, V, Venktesh

arXiv.org Artificial IntelligenceJun-6-2023

Digital textbooks have become an integral part of everyday learning tasks. In this work, we consider the use of digital textbooks for programming classes. Generally, students struggle with utilizing textbooks on programming to the maximum, with a possible reason being that the example programs provided as illustration of concepts in these textbooks don't offer sufficient interactivity for students, and thereby not sufficiently motivating to explore or understand these programming examples better. In our work, we explore the idea of enhancing the navigability of intelligent textbooks with the use of "counterfactual" questions, to make students think critically about these programs and enhance possible program comprehension. Inspired from previous works on nudging students on counter factual thinking, we present the possibility to enhance digital textbooks with questions generated using GPT-3.5.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.00551

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > United Kingdom > England > Durham > Durham (0.05)
North America > United States > Virginia (0.04)
(2 more...)

Genre: Instructional Material > Text Book (0.96)

Industry: Education > Educational Technology > Electronic Devices (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

From Code Complexity Metrics to Program Comprehension

Communications of the ACMApr-22-2023, 10:50:34 GMT

Code is hardly ever developed from scratch. Rather, new code typically needs to integrate with existing code and is dependent upon existing libraries. Two recent studies found that developers spend, on average, 58% and 70% of their time trying to comprehend code but only 5% of their time editing it.32,51 This implies that reading and understanding code is very important, both as an enabler of development and as a major cost factor during development. But as anyone who tries to read code can attest, it is hard to understand code written by others. This is commonly attributed, at least in part, to the code's complexity: the more complex the code, the harder it is to understand, and by implication, to work with. Identifying and dealing with complexity is considered important because the code's complexity may slow down developers and may even cause them to misunderstand it--possibly leading to programming errors. Conversely, simplicity is often extolled as vital for code quality. To gain a sound understanding of code complexity and its consequences, we must operationalize this concept. This means we need to devise ways to characterize it, ideally in a quantitative manner. And indeed, many metrics have been suggested for code complexity. Such metrics can then be used for either of two purposes. In industry, metrics are used to make predictions regarding code quality and development effort.

complexity, complexity metric, experiment, (13 more...)

Communications of the ACM

Country: Asia > Middle East > Israel (0.04)

Genre:

Research Report > Experimental Study (0.94)
Research Report > New Finding (0.86)

Technology:

Information Technology > Software Engineering (0.98)
Information Technology > Artificial Intelligence (0.94)

Add feedback

A Survey on Machine Learning Techniques for Source Code Analysis

Sharma, Tushar, Kechagia, Maria, Georgiou, Stefanos, Tiwari, Rohit, Vats, Indira, Moazen, Hadi, Sarro, Federica

arXiv.org Artificial IntelligenceSep-13-2022

The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis, such as testing and vulnerability detection. Such a large number of studies hinders the community from understanding the current research landscape. This paper aims to summarize the current knowledge in applied machine learning for source code analysis. We review studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we conducted an extensive literature search and identified 479 primary studies published between 2011 and 2021. We summarize our observations and findings with the help of the identified studies. Our findings suggest that the use of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task and summarize machine learning techniques employed. We identify a comprehensive list of available datasets and tools useable in this context. Finally, the paper discusses perceived challenges in this area, including the availability of standard datasets, reproducibility and replicability, and hardware resources.

ieee 25th international conference, sigsoft international symposium, software reliability engineering, (14 more...)

arXiv.org Artificial Intelligence

2110.0961

Country:

North America > United States > California > San Francisco County > San Francisco (0.13)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > New York > New York County > New York City (0.04)
(54 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(8 more...)

Add feedback

Studying Programming in the Neuroage

Communications of the ACMMay-24-2020, 01:10:35 GMT

This is a crazy idea," the review read. Closing my laptop lid, I added in my mind "and ... it will never work," as a lump welled in my throat. What we were proposing to do was simple yet ambitious. Using functional magnetic resonance imaging, we might better understand what goes on in the minds of programmers as they read and understand code. We had performed pilot experiments with a neurobiologist, had promising results, and encouraging words from colleagues and reviewers.

artificial intelligence, cognitive process, program comprehension, (14 more...)

Communications of the ACM

Country:

Europe > Germany > Saxony-Anhalt > Magdeburg (0.05)
North America > United States > North Carolina (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report (0.70)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (0.98)
Health & Medicine > Diagnostic Medicine > Imaging (0.92)

Technology: Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Towards Generation of Visual Attention Map for Source Code

Itoh, Takeshi D., Kubo, Takatomi, Ikeda, Kiyoka, Maruno, Yuki, Ikutani, Yoshiharu, Hata, Hideaki, Matsumoto, Kenichi, Ikeda, Kazushi

arXiv.org Artificial IntelligenceJul-14-2019

Program comprehension is a dominant process in software development and maintenance. Experts are considered to comprehend the source code efficiently by directing their gaze, or attention, to important components in it. However, reflecting importance of components is still a remaining issue in gaze behavior analysis for source code comprehension. Here we show a conceptual framework to compare the quantified importance of source code components with gaze behavior of programmers. We use "attention" in attention models (e.g., code2vec) as the importance indices for source code components and evaluate programmers' gaze locations based on the quantified importance. In this report, we introduce the idea of our gaze behavior analysis using the attention map, and the results of a preliminary experiment.

artificial intelligence, machine learning, software engineering, (20 more...)

arXiv.org Artificial Intelligence

1907.06182

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
Europe > Sweden (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

Add feedback